Main data aquisition
FOSSILPOL
Acquiring the main pollen dataset compilation for our analyses is
done a priori. Raw pollen datasets are carefully selected using the
R-Fossilpol package and the guidelines to the workflow are well
described in Flantua et al. 2023 and our website Fossilpol
project. Most of the compilation of datasets are available through
the Neotoma Paleoecology
Database. Some additional datasets are from private owners in
regions with lacking data, and these have restricted access for use. We
do not have the intellectual property rights to make them public
available. Therefore, only the derivatives of the analyses will be
possible to share openly. Table 1 provides the overview of the the
settings used in the FOSSILPOL workflow to get the standardised datasets
for this project.
Harmonisation tables
An important step in FOSSILPOL to obtain a standardised pollen
dataset within and across regions is harmonisation of pollen types.
There are different analysts with different schools and background, and
the nomenclature can vary widely. To be able to make numerical
comparisons across different pollen records, the level of pollen
taxonomy should be similar. As a result, pollen harmonisation tables are
produced for different regions, attempting to minimise biases related to
this. The regional harmonisation tables used in our project are for
Europe, Levant, Siberia, Southern Asia, Northern America, Latin America,
Africa, and the Indo-Pacific region (Birks et al. harmonisation paper).
These tables can be downloaded from (xxx), and are used as input in the
Fossilpol workflow above (see
Fossilpol step_by_step guide).
Table 1: Selection of settings applied in FOSSILPOL
|
Setting type
|
Selection
|
|
General
|
|
geography_criteria
|
-180
|
|
long_max
|
180
|
|
lat_min
|
-90
|
|
lat_max
|
90
|
|
alt_min
|
NA
|
|
alt_max
|
NA
|
|
private_data
|
TRUE
|
|
Neotoma
|
|
dataset_type
|
pollen
|
|
sel_var_element
|
pollen
|
|
chron_order.type1
|
Varve years BP
|
|
chron_order.type2
|
Calibrated radiocarbon years BP
|
|
chron_order.type3
|
Calendar years BP
|
|
chron_order.type4
|
Radiocarbon years BP
|
|
chron_order.type5
|
Calendar years AD/BC
|
|
chron_order.type6
|
NA
|
Age-depth models
|
|
min_n_of_control_points
|
3
|
|
default_thickness
|
TRUE
|
|
default_error
|
100
|
|
max_age_error
|
3000
|
|
guess_depth
|
10
|
|
default_iteration
|
10000
|
|
default_burn
|
2000
|
|
default_thin
|
8
|
|
iteration_multiplier
|
5
|
|
Site filtering
|
|
pollensum.filter_by_pollen_sum
|
TRUE
|
|
pollensum.min_n_grains
|
25
|
|
pollensum.target_n_grains
|
150
|
|
pollensum.percentage_samples
|
50
|
|
filter_by_age_limit
|
TRUE
|
|
extrapolation.filter_by_extrapolation
|
TRUE
|
|
extrapolation.maximum_age_extrapolation
|
3000
|
|
extrapolation.filter_by_interest_region
|
TRUE
|
|
extrapolation.n_levels.filter_by_number_of_levels
|
TRUE
|
|
extrapolation.n_levels.min_n_levels
|
5
|
|
extrapolation.use_age_quantiles
|
TRUE
|
|
extrapolation.use_bookend_level
|
TRUE
|
Workflow for HOPE Hypothesis 1
We use the targets
R-package to produce a reproducible workflow for all data processing,
analyses, and visualisation of the results from this project. For this
we created a Github repository with a data analysis R-project called HOPE_hypothesis1
in Github that contains all data, metadata, and R functions needed
to run this R-project.
Our output of targets is set up with in an external storage at UiB
OneDrive where we save the main targets folder with data and meta data,
while the targets script and functions are saved in our Github
project.
In case anyone wants to run the workflow, note that first time
running targets without access to our main data folder will take time.
The targets are split up in many steps with several specialised
functions that is made to loading necessary data, to get estimates of
various variables or structuring various subsets of data step wise. This
is to avoid the need for re-running major parts that take too long when
the data is already processed. There is a dependency in the list of
targets that the in the end to run the final analyses, all other targets
need to be up-to-date. Targets will detect any changes made in the
functions. In such case, targets will automatically rerun the parts that
are dependent on this change, but at the same time skip all the parts
that are up-to-date.
The file structure in Github is
## levelName
## 1 HOPE_Hypothesis1
## 2 ¦--___Init_project___.R
## 3 ¦--_targets.R
## 4 ¦--_targets_packages.R
## 5 ¦--HOPE_Hypothesis1.Rproj
## 6 ¦--R
## 7 ¦ °--functions
## 8 ¦ ¦--climate
## 9 ¦ ¦--data_wrangling
## 10 ¦ ¦--events
## 11 ¦ ¦--hvarpart
## 12 ¦ ¦--modelling
## 13 ¦ ¦--PAPs
## 14 ¦ ¦--spd
## 15 ¦ °--visualisation
## 16 °--renv
All that is needed when the R-project is set up, is to run the
scripts ___init_project.R___ and _targets.R.
This will install and load all the packages that are needed.
Temporarily figure (need to make a conceptual figure to be set in
here) 
The targets are arranged in a order to prepare all the data needed
for the main analysis in the end. - 1) The first steps of the workflow
imports and load all data needed for the pre-processing data steps and
analyses such as the final HOPE dataset compilation and other input
tables e.g human event-types for different regions. - 2) This is
followed by data processing for each of the explanatory and response
variables in H1: –a) The largest pre-processing of data starts with
preparation of the explanatory variable detecting past human presence
and impact. This is a major analyses in itself. –b) This follows by data
extraction of palaeo-climate from the CHELSA paleoclimate database. This
is modeled palaeo-climate data for each of the geographical location of
the pollen records. First time the function is run, it will download the
data from a URL connection, and extract data for the climatic variables
selected, and deleted the data that is not needed to save storage in
local computer. –c) It is followed by different targets that process all
estimates of the response variables selected to get measures of
ecosystem properties. In general, all data such as raw estimates and
interpolated data are kept to allow careful checking and validation of
output. -3) In the end, data is structure to fit the main analysis. This
analysis is divided into two parts which we call the –a) the spatial
(within core) analysis, and –b) the temporal analysis (a
spatial or between core/sample analysis per timestep ca. every
500 years). Several choices are made, which are described in the detail
in the text below.
Data filtering
The main data are divided into what is named
data_assemblyin targets which store all the pollen records
and chronologies, and what is named the data_meta which
contain the general site information. We applied a set of filtering
criteria to get as high data quality as possible to be able compare the
numerical estimates on standardised datasets. These filtering criteria
are: removing potentially duplicated pollen records, sorting levels
(samples) by ages, filtering out levels (samples) based on a threshold
of total number pollen grains counted (= pollen sum), filtering out
sequences (pollen records) based on age limits (minimum and maximum age
ranges), filtering out levels (samples) by the last control point,
filtering out samples beyond the age limits of interest, and filtering
out pollen records based on the total number of samples (N).
This filtering is done on the chronologies, raw pollen counts,
harmonised pollen counts, and the age uncertainties related to the
chronologies. The preferable number of minimum pollen grains is set to
150, but this led to a great loss of datasets in regions with less data
coverage, and we therefore reduced this number to 25, if less than 50 %
of the samples had a low count, as low pollen sum can be due to varying
pollen concentrations in different parts of a record. This allow us to
keep more datasets, but in these cases the pollen records have a low
pollen sum, we acknowledge that the estimates of pollen assemblage
properties (PAPs) are less robust (PENDING: check/report how many
cases..). The maximum age beyond extrapolation is set to 3000 years.
Ages extrapolated beyond this threshold is considered highly uncertain.
Also all pollen records with less than 5 samples are removed for further
analyses. The final HOPE dataset compilation used further in our project
is called the data_assembly_filtered.
Detection of past human presence
In order to detect the impact of past humans on fundamental ecosystem
properties we needed to develop indicators of past human presence and
activity. This led to the development of a new approach (see next
figure) where we use detection of human events and indicator identified
from pollen records based on expert knowledge (left and middle), in
combination with the methodology of quantifying presence of humans based
on radiocarbon dates from archaeological artefacts and Summed
Probability Densities (SPD, right) (Bird et al. 2022). In our view this
solved some issues where we can use one standardised variable as an
indicator of past human impact, and partly avoid the difficulties of
creating standardised variables for detecting human disturbance events
in different regions and across continent, and reduce the circularity of
detecting human events on the same pollen records as the estimates of
ecosystem properties.
.
Detection of human events
For each pollen record, we have detected periods of human presence
from the pollen data. Two methods have been used: i) detection from
pollen diagrams (North America, Europe, Asia, Indopacific; ii) detection
using indicator taxa (Latin America).
Detection of human events in pollen diagrams
First, a pollen diagram of each pollen record has been examined by a
regional expert and the age of each event type has been recorded.
PENDING: Add more text about how the event types have been
defined…..
Table 2: Type of human events identified in pollen diagrams
|
Region
|
Event.type
|
|
North America
|
BI = Before Impact; FC = First Cultivation; ES = European Settlement
|
|
Europe
|
BI = Before Impact; FI = First Indication; FCu = First Cultivation; EC =
Extensive Clearance; CC = Complete Clearance
|
|
Asia
|
BI = Before Impact; FI = First Indication; FCu = First Cultivation; EI =
Extensive Impact
|
|
Indopasific
|
no_impact = No Impact; weak = Weak Impact; medium = Medium Impact;
strong = Strong Impact
|
Note that the event types are uniquely defined within continents, and
event types with the same name have different meanings between
continents.
Second, an algorithm is made to get the binary variables (0/1) per
event type identified in each pollen record. A new vector with values of
average ages in between levels (samples) of the identified event type
was created because when timing of events are assumed to have occurred
before the changed identified. A new matrix where we use the new age
vector with the events type were made, assigning the different event
type binary values 0 or 1 for depending if the event type is present
(i.e. when age > detected age for the specific event type). Finally,
logical rules were applied and the binary values were adjusted for each
region to get the event data for each pollen record. If there is no
human event identified, it does not mean humans were absent, but that is
was not possible to identify human activity from the pollen records.
Below is a summary figure that intends to visualise the underlying
raw data of human events. For simplicity the data are aggregated by
ecozones and regions, and not individual datasets. The different colours
display the human event-types within each region. The raw event types is
represented by presence/absence data (the circle dots on the y-axis 0
and 1 ). The smooth trend lines represent simple binomial GAM models for
each ecozone that show the main trends and changes in timing of events
over time. For example to interpret this figure, it show that in the
Cold ecozone in Europe, human impact (bi = before impact) was not
detected in any dataset before ca 7000 years BP, whereas bi
then declines steadily over time, other event types identified starts to
become evident. First indication (fi) of human disturbance starts to
increase around 7500 years BP when bi is declining. More and
more samples with first human indication increases steadily across this
ecozone up to ca 2500 years BP before it starts to decline. After that
the other events types are becoming apparent such as increase in first
cultivation (fc), followed by increasing numbers of identified signs of
early clearance (approximately 3000 years BP). Complete clearance is
only identified the last 2000 years where it increases up to the present
time (but it is not extensive across the pollen records, highest
proportions around 0.25).




PENDING: In North America, Asia, and Oceania there is an extra panel
entitled NA. It contains a few datasets without defined
ecozone. In addition, from the figure you can see that the region
Oceania contains the var_name of the events type identified in
Asian datasets. This indicate that there must be a dataset or two that
originally was compiled for Asia that is now in Oceania (PENDING: to be
identified).
Detection using indicator taxa
An extensive literature review has been done to detect fossil pollen
taxa that can be associated with human activity (PENDING: Add the two
tables of groups of indictors and single indictors). The aggregation of
indicator groups vary across the continent, both between regions and
countries, and the combination of indicator groups for pollen records
within the area of expertise. An algorithm has been made to assess each
pollen records, which goes through the presence of taxa in combinations
of the recorded indicator groups, and detect if the signal of pollen
indicator is based on a defined threshold.
Two categories were created: i) ‘human event indicators’: a single
taxa associated with human activity classified as ‘weak’ or ‘strong’
impact; ii) ‘human event indices’: a combination of particular taxon is
classified as ‘weak’ or ‘strong’ impact. For each level of each pollen
record, all indicators and indices were tested for its presence in the
level and whole level we classified as “no impact”, “weak impact” or
“strong impact”. Specifically, when one pollen grain of a human
indicator is present, it is assigned as ‘weak’ and when more than 1
pollen grain of a strong impact indicator is present we assign it as
‘strong’. In addition, the pollen-type Pinus was only
considered as a human indicator outside of its native distribution (see
XXX?). For indices, any number of pollen grain needs to present for that
specific combination.
Summary figure of the events using indicator taxa in different
ecozones in Latin America:

Arcehological artefacts and Summed Probability
Densities
We used the global dataset of radicarbon dates (RC dates) of
archeological artefacts from Bird et al 2022. The quantification of SPD
requires a distance to be selected around each site location to collect
the relevant dates of archaeological artefacts around it. This will
limit the area of human presence and indirectly the amount of human
activity relevant to pollen records from each site.
Only RC dates with valid geographical location (longitude and
latitude), and ‘LocAccuracy’ > 0 were selected. For each pollen
record, RC dates were classified by the geographical distance to the
pollen record. The chosen distance classes were: 5, 25, 50, 100, 250,
500 km.
For each pollen record, one variable of SPD´s in time was calculated
for each distance class. Radiocarbon dates were calibrated using
calibrate function from rcarbon package with
appropriate calibration curves (“IntCal20”, “ShCal20”, “mixed”).
Calibration curves were obtained rcarbon package and
“mixed” was created using rcarbon::mixCurves function with
‘p’ = 0.5. Calibration curves were assigned by their geographical
location following Hua et al., 2013.
SPD is estimated using spd function from
rcarbon package for each distance class for each year
between a minimum threshold age and 12 ka. However, distance class with
less than 50 RC dates is filtered out in order to maintain robust SPD
estimation. The minimum threshold ages are different for different
regions and are decided based on the availability of radiocarbon dating
for different regions. In general there is a bias that radiocarbon
dating is rather limited on younger material where in many regions there
are a lack of C14 data during the last 2000 years. Table 3 show the ages
where data younger than age_from where removed.
Table 3: Minimum ages above which C14 data was removed (age_from) for
different regions
|
region
|
age_from
|
|
Europe
|
2000
|
|
Latin America
|
2000
|
|
Asia
|
2000
|
|
Africa
|
2000
|
|
North America
|
500
|
|
Oceania
|
500
|
In order to select distance from each pollen record, which will limit
the area of human activity relevant to that pollen record, we used the
expert-based detection of human events and the human impact detection
from indicators of pollen types to inform the estimation of SPD:
For each distance class of SPD of each pollen record, one Redundancy
Analysis (RDA) is estimated using vegan::rda function with
event types as responses (binary) and SPD values as predictors. Next, R2
is estimated using vegan::RsquareAdj function for each
distance class. Finally, the distance class with the highest R2 is
selected as the representation of human presence, and indirectly human
activity, for that pollen record.
This approach is not perfect and it is neglecting topographic
differences and presence of water bodies. However, this was selected as
a balance between simplicity and generality, and to avoid unnecessary
increase of complexity in choosing the distance from the records. See
demonstration of this method Detection
of past humans.
Paleo Climate
Paleoclimate from the CHELSA-TraCE21k downscaling algorithm is
downloaded from the CHELSA database (Karger et al. 2021, Karger et al.
2021). The selected bioclimatic variables are annual mean temperatures ℃
(bio1), minimum temperatures of coldest month ℃ (bio6), annual
precipitation kg m-2 year-1 (bio12), precipitation seasonality (bio15),
precipitation of warmest quarter kg m-2 quarter-1 (bio18) and
precipitation of coldest quarter kg m-2 quarter-1 (bio19), where we
extracted climate values for the coordinates for each dataset_id
retrieving the full time series of every 100 years. In addition, we
downloaded the monthly climatology for daily maximum near-surface
temperature K/10 (tasmin).
Pollen assemblage properties (PAP) estimation
To prepare the response variables of our main pollen dataset
compilation and to be able to analyse fundamental ecosystem properties,
we prepared the standard estimates of pollen assemblage properties (PAP)
(Bhatta et al. 2023). The PAP estimations provide different aspects of
pollen assemblage diversity which includes palynological richness,
diversity and evenness, compositional change and turnover, and
Rate-of-Change (RoC).
These response variables are calculated using the newly developed R-Ecopol
package that contain all the functions needed to estimate PAPs for
our pollen data assembly. The base functions used in this package are
derived from other dependency packages such as mvpart
package (Therneau et al. 2014) to estimate pollen zonations with
multivariate regression trees, vegan (Oksanen et al. 2022)
for other mutivariate techniques and dissimilarity indices,
R-Ratepol (Mottl 2021) to get the estimates of RoC,
functions from iNext (Chao et al. 2014) that have been
modified to extract interpolated Hill numbers based on a minimum sample
size, and newly developed R functions to run DCCA using
Canoco 4.5 (ter Braak xxxx) to list a few, among other,
dependency packages.
Pollen richness, diversity, and evenness
The different aspects of palynological diversity are estimated using
Hill´s effective species numbers N0, N1, N2, and the associated evenness
ratios of N2/N1 and N1/N0. These are combined through one equation where
the effective species numbers differ mainly in how the rare taxa are
weighted in the parameter q:
\[^q{D} = (\sum_{i=1}^{S}
p_{i}^{q})^{1/(1-q)}\]
When q is 0, rare and abundant taxa have equal weight and the number
is simply the number of taxa in the sample. The equation is not possible
to define for q = 1, but as it approaches 1, it is equal to the
exponential of the well-known Shannon index and reports the number of
equally common taxa. When q = 2, it is the same as the inverse Simpson
diversity index and provides the number of equally abundant taxa with a
low weight on rare taxa. The advantage of using effective species
numbers is that they provide easily interpretable units and contain the
doubling effect. To standardize the sample sizes, we use the rarefaction
approach developed by Chao et al. These estimates are rarefied to the
number of n = 150 grains, or in some cases to a lower sum (minimum n =
25). Some pollen records were only available as pollen percentages, and
as the sample size is unknown, these are then rarefied to the minimum
sum of percentages. The evenness ratios will be 1 if all taxa are
equally abundant, and the ratios hence indicate changes in abundances
between the numbers of rare, equally common, and abundant taxa.
We acknowledge that even though attempts are made to standardise
richness and diversity estimates based on standard sample size, there
are additional biases that are not taken into consideration such as
differences in total pollen production and pollen representation
(Odgaard 1998, 2001). In some cases, the total pollen sum is also too
low to be considered a robust estimate, but it was a choice made on
balancing loosing too much information from geographical areas with less
data coverage (see data filtering above).
Compositional change
Compositional change is calculated using multivariate regression
trees (MRT) with age as the constraining variable. MRT is in general a
robust tool to explore and predict changes in multivariate data sets
using environmental predictor variables (De´ath, Simpson and Birks
2012). This technique has been adopted in palaeoecology to detect major
zones in pollen diagrams or shifts between periods of homogeneous
vegetation in time (Simpson and Birks 2012). We use the pollen taxa in
percentages without any data transformations as the response and the
median ages derived from the age-depth model as the constraining
variable. The recursive partitioning are based on chi-square distances
between pollen samples constrained by time. The number of
cross-validation is set to 1000, and the optimal sized tree is chosen
based on the 1SD rule (Simpson and Birks 2012).
Compositional turnover
Compositional turnover is estimated using detrended canonical
correspondence analysis (DCCA) with age as the explanatory variable (ter
Braak and Smilauer 2007?). Changes in Weighted average (WA) sample
scores (CaseR scores sensu ter Braak and Smilauer 2012) are measures of
compositional turnover in standard deviation (SD) units (Birks 2007).
The WA scores are regressed with time using a second-order polynomial
(age+age^2) to allow more flexibility in the turnover pattern within a
pollen record. Total compositional turnover is a measure of the total
length of CaseR scores along the DCCA axis 1, whereas the pattern within
a record is the measures between the individual samples along the DCCA
axis 1. The response data are pollen percentages without any
transformation to maintain the chi-square distances between samples,
whereas the ages are the median ages derived from the age-depth model
for each site.
Rate-of-change
Rate-of-change for the pollen assemblages in the pollen records are
quantified using the novel R-Ratepol
package (Mottl et al. 2021). RoC is estimated using moving windows
of 500 years´time bins of five number of windows shifts where samples
are randomly selected for each bin. This approach is shown to increase
the correct detection of RoC peak-points than the more traditional
approaches (Mottle et al. 2021). RoC are reported as dissimilarity per
500 years using the Chord dissimilarity coefficient. Sample size is
standardized in each working unit to 150 grains or the lowest number
detected in each dataset. We use only the RoC scores further in the
analyses.
Change-points detection and density estimates
Change-points detection of all the PAP variables are calculated using
conventional regression trees (RT) for single variables with Euclidean
distances. An algorithm is made to detect the transitions between the
resulting groups (or zones) per variable, and these are coded as new
binary (0/1) variables. A change-point is defined as 1, where the mean
ages between the two consecutive samples are used as the timing of this
significant change. This is done individually for each PAP variable.
The density is calculated for each of these variables using a
Gaussian kernel, re-scaling them to each of the records specific age
ranges (i.e. minimum and maximum ages). To solve the boundary issue in
density estimation the data is reflected to 0. Hierarchical generalised
additive models (HGAM) are then used to produce two different fitted
variables where the first density get the common pattern of the density
estimates of all the variables reflecting significant changes in
richness, diversity, and evenness, and the second variable get the
common pattern of the densities of the variables reflecting significant
change in pollen assemblages (MRT, RoC, DCCA1).
Example of ten sites and PAP estimations (PENDING: CHANGE THIS FIGURE
TO A NEW EXAMPLE FROM HOPE DATASET):

Hierarchical variation partitioning
All response variables, except the two fitted diversity and
compositional density variables, have first been estimated on the main
harmonised pollen records for each sites using. To get the estimates of
equal spacing of 500 years, we then used linear interpolation. Equal
time spacing is necessary for the temporal analysis where we analyse
changes in PAP for different time steps in a spatial context (i.e within
ecozones).
To choose an interpolation method to get the data on equal time
steps, we compared the results of using GAMs versus simple linear
interpolations. Linear interpolation showed that the correlation
structure between the multivariate response variables are more similar
to the original estimates without equal spacing, and the univariate GAM
models could show unexpected patterns in single PAP estimates that
changed these correlations.Since we cannot individually assess single
GAM models for each variable for all the records, we choose the simplest
interpolation method to keep the data as close as possible to the
original estimates before using them in the multivariate models. We also
remove Africa as a region, because we do not have any data of past
human presence.
To test if ecological processes have changed due to past human
activity within cores, we use reduced rank multivariate regression. This
is also known as distance-based redundancy analysis (db-RDA). We used
the R package rdacca.hp to run hierarchial variation
partitioning with several predictors. This estimates the variation per
variables in different combinations to get the average variable
importance independent of the order of predictors. db-RDA was performed
using Gower-distances adding a constant because we are dealing
with a mix of numerical response variables. These are the PAPs with
varying units. Depending on the type of analyses (see below), the
explanatory variables are SPDs and palaeoclimatic variables. For within
record analysis (see below) we also include time as an explanatory
variable. SPD is then the important variable of main interest as it
represent past human presence. The palaeoclimate is a matrix of summer
precipitation, winter precipitation, annual temperatures and winter
temperatures. These are selected as we considered them most relevant to
represent differences in climatic conditions within all the regions
(regarding temperatures, seasonality and aridity). Time is represented
by the ages of each pollen record, however, this is more difficult to
interpret. We assume age may represent time dependent vegetation changes
such as natural successions and/or ecological changes due to interaction
between taxa.
The predictor variables can be applied either as individual
predictors or as groups as predictors. In our case, we run the analysis
with groups of predictors. This means that the palaeoclimatic
variables are included as one matrix and not assessed as individual
predictors. (Though the overall results does not change much). The
advantage using palaeoclimate as single predictors is that it is
possible to get which of climatic variables are important (high adj R2)
or statistically significant, if significant testing is turned on.
Statistical testing can be applied in two ways, both which shuffle the
predictor variables. Using time_series set as TRUE,
statistical testing is run with restricted Monte Carlo permutation for
time series analysis (sensu ter Braak XXXX) which has a cyclic behavior
of samples to keep the stratigraphical order intact. If
time_series is set as FALSE, predictor variables will be
randomly shuffled as many times set in the parameter
permutations (hence it use the default permutation in the
rdcca.hp package).
The analysis is run in two different ways to analyse: - 1) to analyse
spatial changes which run the hierarchical variation
partitioning within single record, and - 2) to analyse the temporal
patterns in space for each of the 500 year time steps.
In this latter analysis, we restructure the data as samples across
space within selected Koppen-Geiger division of five major ecozones on
each continent, whereas the analysis is run per time bin. The predictor
groups are past human presence and the matrix of palaeoclimatic
variables. For this, we filter out time bins which have less than 5
samples. For some bins, if all the spds equal zero, the analysis will
fail. These cases indicate insufficient numbers of predictors and will
return NA for these specific time bins.
Comments to the preliminary results
PENDING TASKS:
Some results do not make sense:
- We get negative variations, what do we do with this? One extreme outlier in North America also indicated that the individual percentage explained for humans was 2000 (therefore I did not use the individual percentage to report the results for the predictors)
- Since we aggregate and summaries the results per ecozone, how to recalculate this into percentages contribution for the humans? Can we use sum(unique/total explained)/number of datasets per ecozone as the total constraining variation vary per dataset. Other suggestions?
Remember the unique variation is when time and climate is removed and individual variation per predictor is the same as a single model with the one predictor.
I hope this report helps everyone to get the overview of the work behind HOPE Hypothesis 1. I am very unhappy about the presented figures at this stage so all thoughts, ideas, and (critical) comments about the methodology and display of (what is important) results are most welcome.